An Evaluation of Parser Robustness for Ungrammatical Sentences
نویسندگان
چکیده
For many NLP applications that require a parser, the sentences of interest may not be well-formed. If the parser can overlook problems such as grammar mistakes and produce a parse tree that closely resembles the correct analysis for the intended sentence, we say that the parser is robust. This paper compares the performances of eight state-of-the-art dependency parsers on two domains of ungrammatical sentences: learner English and machine translation outputs. We have developed an evaluation metric and conducted a suite of experiments. Our analyses may help practitioners to choose an appropriate parser for their tasks, and help developers to improve parser robustness against ungrammatical sentences.
منابع مشابه
Treebanks Gone Bad Parser Evaluation and Retraining using a Treebank of Ungrammatical Sentences
This article describes how a treebank of ungrammatical sentences can be created from a treebank of well-formed sentences. The treebank creation procedure involves the automatic introduction of frequently occurring grammatical errors into the sentences in an existing treebank, and the minimal transformation of the original analyses in the treebank so that they describe the newly created ill-form...
متن کاملRobustness Evaluation of Two CCG, a PCFG and a Link Grammar Parsers
Robustness in a parser refers to an ability to deal with exceptional phenomena. A parser is robust if it deals with phenomena outside its normal range of inputs. This paper reports on a series of robustness evaluations of state-of-the-art parsers in which we concentrated on one aspect of robustness: its ability to parse sentences containing misspelled words. We propose two measures for robustne...
متن کاملParsing Ungrammatical Input: an Evaluation Procedure
This paper presents a procedure for evaluating a parser’s ability to produce an accurate parse for an ungrammatical sentence. It is based on the existence of a corpus of ungrammatical sentences, and a parallel corpus containing corrected, and hence grammatical, versions of the sentences in the first corpus. This procedure is applied to a wide-coverage probabilistic parser (Charniak, 2000), and ...
متن کاملError-repair Dependency Parsing for Ungrammatical Texts
We propose a new dependency parsing scheme which jointly parses a sentence and repairs grammatical errors by extending the non-directional transitionbased formalism of Goldberg and Elhadad (2010) with three additional actions: SUBSTITUTE, DELETE, INSERT. Because these actions may cause an infinite loop in derivation, we also introduce simple constraints that ensure the parser termination. We ev...
متن کاملAdapting a WSJ-Trained Parser to Grammatically Noisy Text
We present a robust parser which is trained on a treebank of ungrammatical sentences. The treebank is created automatically by modifying Penn treebank sentences so that they contain one or more syntactic errors. We evaluate an existing Penn-treebank-trained parser on the ungrammatical treebank to see how it reacts to noise in the form of grammatical errors. We re-train this parser on the traini...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016